This vignette describes a standard processing pipeline for labeled
proteomics data with the pmartR package, including
normalization to the reference pool, quality control, global
normalization, and protein rollup. After normalization to the reference
pool, the pipeline is identical to a typical processing pipeline for
unlabeled or global proteomics data, and very similar to processing for
metabolomics and lipidomics data. A separate vignette, “RNAseq Data
Processing”, describes the typical processing steps for seqData objects.
For more details about the statistical analyses available within
pmartR see the “Statistical Analysis with pmartR”
vignette.
Labeled Proteomics Data
Below we load in 3 example data frames from the
pmartRdata package in order to create an isobaricpepData
object. For more details on omicsData object creation, see the “Data
Object Creation” vignette. These data consist of samples from three
different strains of a virus, plus pooled reference samples. Since this
is only a subset of data from this study, the number of samples is not
identical across TMT plexes, and the TMT plex names are not
sequential.
For these data, the column named “Peptide” denotes the peptide
identifier in e_data.
head(edata)
The f_data data frame contains the sample identifier, “SampleID”,
mapping to the column names in e_data other than “Peptide”. The TMT plex
on which each sample was run is also included in f_data, as this will be
required for normalization to the reference pool. The run order, which
was spread across multiple days in this case, is also included in f_data
although this information will not be used in our analyses. The column
for “Virus” takes on 1 of 3 values, and will be used as our main effect,
since comparisons of interest in this experiment are between virus
strains. Finally, f_data contains information about the Donor and
Replicate, which describe the provenance of the samples and biological
replicate information.
head(fdata)
The e_meta data frame contains just 2 columns, “Protein” and
“Peptide”. The “Peptide” column contains the same peptides found in
e_data, and “Protein” provides the mapping of the peptides to the
protein level.
head(emeta)
Since these are isobaric labeled (TMT) peptide data, we will create
an isobaricpepData object in pmartR. Next, we
log2 transform the data.
Log transforming the data prior to analysis is highly recommended,
and pmartR supports log2, log10, and natural logarithm
transformations. The edata_transform() function provides
this capability. We can also use edata_transform() to
transform the data back to the abundance scale if needed. Note that the
scale of the data is stored and automatically updated in the
data_info$data_scale attribute of the omicsData object.
mypep <- as.isobaricpepData(e_data = edata,
f_data = fdata,
e_meta = emeta,
edata_cname = "Peptide",
fdata_cname = "SampleID",
emeta_cname = "Protein",
data_scale = "abundance")
# log2 transform the data
mypep <- edata_transform(omicsData = mypep, data_scale = "log2")
We can use the summary function to get basic summary
information for our isbaricpepData object.
summary(mypep)
Normalize to Reference Pool
One of the first steps when analyzing labeled proteomics data is to
normalize to the reference pool samples. We use the pmartR
normalize_isbaric function to first create an
isobaricnormRes object, but not apply the normalization.
This allows us to examine the reference pool samples prior to using them
for this normalization step.
# Don't apply the normalization quite yet; can use summary() and plot() to view reference pool samples
mypep_refpools <- normalize_isobaric(omicsData = mypep,
exp_cname = "Plex",
apply_norm = FALSE,
refpool_cname = "Virus",
refpool_notation = "Pool")
We can use the summary and plot functions
on our isobaricnormRes object, to make sure that there
isn’t anything concerning about these samples.
summary(mypep_refpools)
plot(mypep_refpools)

When we’re ready to actually perform the reference pool
normalization, we can do so as follows with the apply_norm
argument.
# Now apply the normalization; can use plot() to view the study samples after reference pool normalization
mypep <- normalize_isobaric(omicsData = mypep,
exp_cname = "Plex",
apply_norm = TRUE,
refpool_cname = "Virus",
refpool_notation = "Pool")
We can plot the samples after normalization to the reference
pool.
plot(mypep)

Workflow
Once the omicsData object is created, and after normalization to the
reference pool samples for labeled peptide data, a typical QC workflow
follows the figure below.
Main Effects
We are preparing this data for statistical analysis where we will
compare the samples belonging to one group to the samples belonging to
another, and so we must specify the group membership of the samples. We
do this using the group_designation() function, which
modifies our pepData object and returns an updated version of it. Up to
two main effects and up to two covariates may be specified, with one
main effect being required at minimum. For these example data, we
specify the main effect to be “Virus” so that we can make comparisons
between the different strains. Certain functions we will use below
require that groups have been designated via the
group_designation() function, and doing so makes the
results of plot and summary more
meaningful.
mypep <- group_designation(omicsData = mypep, main_effects = "Virus")
summary(mypep)
plot(mypep, order_by = "Virus", color_by = "Virus")

The group_designation() function creates an attribute of
the dataset as follows:
attributes(mypep)$group_DF
Filter Biomolecules
It is often good practice to filter out biomolecules that do not meet
certain criteria, and we offer up to 5 different filters to do this.
Each of the filter functions calculates metric(s) that can be used to
filter out biomolecules and returns an S3 object. Using the
summary() function produces a summary of the metric(s) and
using the plot() function produces a graph. Filters that
require a user-specified threshold in order to actually filter out
biomolecules have corresponding summary and
plot methods that take optional argument(s) to specify that
threshold. Once one of the filter functions has been called, the results
of that function can be used in conjunction with the
applyFilt() function to actually filter out peptides based
on the metric(s) and user-specified threshold(s) and create a new,
filtered omicsData object.
Proteomics Filter
The proteomics filter is applicable only to peptide level data
(isobaricpepData or pepData objects) that
contain the e_meta component, as this filter counts the
number of peptides that map to each protein and/or the number of
proteins to which each individual peptide maps.
Using the argument degen_peps = TRUE, we see from the
summary that each peptide maps to a single protein, so
there are no redundant peptides to remove.
myfilter <- proteomics_filter(mypep)
summary(myfilter, degen_peps = TRUE)
##
## Summary of Proteomics Filter
##
## Obs Proteins Per Peptide Obs Peptides Per Protein
## Min. 1 1
## 1st Qu. 1 1
## Median 1 5
## Mean 1 13.2369764438157
## 3rd Qu. 1 17
## Max. 1 639
##
## Filtered 0 0
## Not Filtered 215220 16259
plot(myfilter, bw_theme=TRUE)

Using the argument min_num_peps, we can restrict our
dataset to just those proteins with a minimum number of peptides mapping
to them. A common threshold to use is 2, as this ensures that only
proteins with at least 2 peptides mapping to them are retained. We can
apply the proteomics filter using the applyFilt
function.
summary(myfilter, min_num_peps = 2)
##
## Summary of Proteomics Filter
##
## Obs Proteins Per Peptide Obs Peptides Per Protein
## Min. 1 1
## 1st Qu. 1 1
## Median 1 5
## Mean 1 13.2369764438157
## 3rd Qu. 1 17
## Max. 1 639
##
## Filtered 0 4984
## Not Filtered 215220 11275
# apply the filter, requiring proteins to have at least 2 peptides that map to them
mypep <- applyFilt(filter_object = myfilter, omicsData = mypep, min_num_peps = 2)
Molecule Filter
The molecule filter allows the user to remove from the dataset any
biomolecule not seen in at least min_num samples. The user
may specify a threshold of the minimum number of times each biomolecule
must be observed across all samples; the default value is 2. See the
“Filter Functionality” vignette for more detail about this filter, and
how to account for main effects and/or batches.
myfilter <- molecule_filter(mypep)
summary(myfilter, min_num = 3)
##
## Summary of Molecule Filter
## ----------------------------------
## Minimum Number:3
## Filtered:60349
## Not Filtered:149887
##
## num_observations frequency_counts
## 1 60340
## 2 60349
## 3 60361
## 4 60380
## 5 60405
## 6 60446
## 7 60505
## 8 60593
## 9 60822
## 10 66711
## 11 105829
## 12 105844
## 13 105855
## 14 105865
## 15 105890
## 16 105922
## 17 105957
## 18 106022
## 19 106120
## 20 106345
## 21 111160
## 22 135001
## 23 136474
## 24 136480
## 25 136495
## 26 136518
## 27 136549
## 28 136589
## 29 136636
## 30 136742
## 31 136966
## 32 139884
## 33 159015
## 34 165736
## 35 165750
## 36 165758
## 37 165765
## 38 165785
## 39 165804
## 40 165833
## 41 165910
## 42 166014
## 43 166299
## 44 172725
## 45 210236
plot(myfilter, bw_them = TRUE)

Setting the threshold to 3, we would filter 60,349 peptides out of
the dataset. If we’d like to make the filter less stringent, we could
use a threshold of 2. We use the applyFilt function to
apply the molecule filter.
summary(myfilter, min_num = 2)
##
## Summary of Molecule Filter
## ----------------------------------
## Minimum Number:2
## Filtered:60340
## Not Filtered:149896
##
## num_observations frequency_counts
## 1 60340
## 2 60349
## 3 60361
## 4 60380
## 5 60405
## 6 60446
## 7 60505
## 8 60593
## 9 60822
## 10 66711
## 11 105829
## 12 105844
## 13 105855
## 14 105865
## 15 105890
## 16 105922
## 17 105957
## 18 106022
## 19 106120
## 20 106345
## 21 111160
## 22 135001
## 23 136474
## 24 136480
## 25 136495
## 26 136518
## 27 136549
## 28 136589
## 29 136636
## 30 136742
## 31 136966
## 32 139884
## 33 159015
## 34 165736
## 35 165750
## 36 165758
## 37 165765
## 38 165785
## 39 165804
## 40 165833
## 41 165910
## 42 166014
## 43 166299
## 44 172725
## 45 210236
mypep <- applyFilt(filter_object = myfilter, mypep, min_num = 2)
summary(mypep)
Coefficient of Variation Filter
The coefficient of variation (CV) filter calculates the pooled CV
values as in (Ahmed 1995).
The user can then specify a CV threshold, above which peptides are
removed.
myfilter <- cv_filter(mypep)
summary(myfilter, cv_threshold = 140)
##
## Summary of Coefficient of Variation (CV) Filter
## ----------------------
## CVs:
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.231 30.605 35.206 38.178 41.904 246.023
##
## Total NAs: 7
## Total Non-NAs: 149889
##
## Number Filtered Biomolecules: 146
plot(myfilter, cv_threshold = 140)

mypep <- applyFilt(filter_object = myfilter, omicsData = mypep, cv_threshold = 140)
summary(mypep)
Custom Filter
Sometimes it is known a priori that certain peptides or samples
should be filtered out of the dataset prior to statistical analysis.
Perhaps there are known contaminant proteins, and so peptides mapping to
them should be removed, or perhaps something went wrong during sample
preparation for a particular sample. On the other hand, it may be
preferred to specify peptides or samples to keep (removing those not
explicitly specified), and this can also be accomplished. Keep in mind
that both ‘remove’ and ‘keep’ arguments cannot be specified together;
either ‘remove’ arguments only or ‘keep’ arguments only may be specified
in a single call to custom_filter().
Here, we demonstrate the removal of contaminant proteins and their
associated peptides; the contaminant protein names contain “XXX”.
remove_prots <- mypep$e_meta$Protein[grep("XXX", mypep$e_meta$Protein)]
myfilter <- custom_filter(mypep, e_meta_remove = remove_prots)
summary(myfilter)
mypep <- applyFilt(filter_object = myfilter, omicsData = mypep)
Note that there is a summary() method for objects of
type custom_filt, but no plot() method.
IMD-ANOVA Filter
The IMD-ANOVA filter removes biomolecules that do not have sufficient
data for the statistical tests available in pmartR; these
are ANOVA (quantitative test) and an independence of missing data (IMD)
using a g-test (qualitative test). When using the
summary(), plot(), and
applyFilt() functions, you can specify just one filter
(ANOVA or IMD) or both, depending on the tests you’d like to perform
later.
First we create the filter object.
myfilter <- imdanova_filter(mypep)
Here we consider what the filter would look like for both ANOVA and
IMD.
summary(myfilter, min_nonmiss_anova = 2, min_nonmiss_gtest = 3)
##
## Summary of IMD-ANOVA Filter
##
## Total Observations: 149269
## Filtered: 27
## Not Filtered: 149242
Here we consider what the filter would look like for just ANOVA.
summary(myfilter, min_nonmiss_anova = 2)
##
## Summary of IMD-ANOVA Filter
##
## Total Observations: 149269
## Filtered: 42
## Not Filtered: 149227
Here we consider what the filter would look like for just IMD.
summary(myfilter, min_nonmiss_gtest = 3)
##
## Summary of IMD-ANOVA Filter
##
## Total Observations: 149269
## Filtered: 47
## Not Filtered: 149222
Now we apply the filter for both ANOVA and IMD.
mypep <- applyFilt(filter_object = myfilter, omicsData = mypep, min_nonmiss_anova = 2, min_nonmiss_gtest = 3)
## You have specified remove_singleton_groups = TRUE, but there are no singleton groups to remove. Proceeding with application of the IMD-ANOVA filter.
summary(mypep)
Filter Samples
To identify any samples that are potential outliers or anomalies (due
to sample quality, preparation, or processing circumstances), we use a
robust Mahalanobis distance (rMd) (Matzke et al.
2011) score based on 2-5 metrics. The possible metrics are:
The rMd score can be mapped to a p-value, and a p-value threshold
used to identify potentially outlying samples. In general, for
proteomics data we recommend using all 5 metrics (or sometimes leaving
out Kurtosis). A plot of the rMd values for each sample is generated,
and specifying a value for ‘pvalue_threshold’ results in a horizontal
line on the plot, with samples above the line slated for filtering at
the given threshold.
myfilter <- rmd_filter(omicsData = mypep, metrics = c("Correlation", "Proportion_Missing", "MAD", "Skewness"))
plot(myfilter)

summary(myfilter, pvalue_threshold = 0.0001)
##
## Summary of RMD Filter
## ----------------------
## P-values:
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2708 0.4910 0.4726 0.6911 0.9980
##
## Metrics Used: MAD, Skewness, Corr, Proportion_Missing
##
## Filtered Samples: StrainA_D3_R2, StrainB_D3_R2, StrainB_D3_R4
plot(myfilter, pvalue_threshold = 0.0001, bw_theme=TRUE)

Using the output from the summary() function, we can
explore the potential outliers identified to see which metrics are
driving their “outlier-ness”. Box plots for each metric are graphed,
with the specified sample marked with an X.
# get vector of potential outliers
potential_outliers <- summary(myfilter, pvalue_threshold = 0.0001)$filtered_samples
# loop over potential outliers to generate plots
if(length(potential_outliers) > 0){
for(i in 1:length(potential_outliers)){
print(plot(myfilter, sampleID = potential_outliers[i]))
}
}



StrainB_D3_R4 appears to have been flagged as a potential outlier
primarily on the basis of its MAD relative to the other samples.
StrainA_D3_R2 and StrainB_D3_R2 have similar profiles across the four
metrics, with high amounts of missing data, low correlations, and high
MAD and skew.
We can use this information to help determine whether to remove any
of these samples. It is often useful to look at additional data
summaries to inform outlier removal, such as a correlation heat map or
principal components analysis (PCA) plot.
In the correlation heatmap we are looking to see if any samples stand
out as having low correlation with a majority of the others; any such
samples would appear as dark stripes. We use the
interactive = TRUE argument so we can hover over the plot
and see which samples correspond to the cells in the heatmap. None of
the samples stands out as concerning in the correlation heatmap for
these data.
mycor <- cor_result(omicsData = mypep)
plot(mycor, interactive = TRUE)
The PCA function in pmartR utilizes (Stacklies et al. 2007) probabilistic PCA, which
which allows for missing data. In the PCA plot we are interested to see
whether any of the potential outliers identified by the rMd filter do
not cluster with their main effect group. Note that for large proteomics
datasets, such as this example, generating the PCA results may take up
to a minute or longer.
Using the interactive = TRUE argument to the plot
function, we can see that the three potential outliers identified by the
rMd filter are the same samples that do not cluster with the others
(paying attention to the R^2 values on the axes).
mypca <- dim_reduction(omicsData = mypep)
plot(mypca, interactive = TRUE)
Given the above exploratory analyses, we’d recommend removing all
three samples identified by the rMd filter. Note that if we had only
wanted to remove a subset of those samples, we could do so with a custom
filter, as opposed to applying the rMd filter.
mypep <- applyFilt(filter_object = myfilter, omicsData = mypep, pvalue_threshold = 0.0001)
summary(mypep)
Summarize Data
In addition to the correlation heatmap and PCA plot that we utilized
above, the pmartR package contains additional methods for
data summarization and exploration that can be used as part of the QC
process: numeric summaries and associated plots (via the
edata_summary function), missing data summarization (via
missingval_result, plot_missingval,
missingval_scatterplot and
missingval_heatmap).
Numeric Summaries
We can generate numeric summaries of our data by either sample or
molecule. The edata_summary function computes the mean,
standard deviation, median, percent of observations for which a value
was observed, the minimum value, and the maximum value.
edata_summary(omicsData = mypep, by = "sample", groupvar = NULL)
## mean
## sample mean
## 1 StrainC_D2_R1 -1.06143305348409
## 2 StrainB_D1_R1 0.205368519199358
## 3 StrainC_D1_R2 0.158855539253571
## 4 StrainA_D1_R2 0.280640981834678
## 5 StrainB_D2_R4 -0.1159542371296
## 6 ---- ----
## 38 StrainB_D1_R3 -0.112316793848367
## 39 StrainB_D2_R5 0.23073893517671
## 40 StrainB_D3_R3 -0.278105412627509
## 41 StrainA_D2_R3 0.121439492777355
## 42 StrainC_D2_R2 0.0971310640249695
##
## std_dev
## sample sd
## 1 StrainC_D2_R1 0.410541305359474
## 2 StrainB_D1_R1 0.447658581163237
## 3 StrainC_D1_R2 0.461345839153307
## 4 StrainA_D1_R2 0.461549205240424
## 5 StrainB_D2_R4 0.420595034818473
## 6 ---- ----
## 38 StrainB_D1_R3 0.427848475258554
## 39 StrainB_D2_R5 0.44500111374766
## 40 StrainB_D3_R3 0.368193551096942
## 41 StrainA_D2_R3 0.402055500910083
## 42 StrainC_D2_R2 0.330191038717702
##
## median
## sample median
## 1 StrainC_D2_R1 -1.027005550312
## 2 StrainB_D1_R1 0.22220516676917
## 3 StrainC_D1_R2 0.196373098918481
## 4 StrainA_D1_R2 0.295091657180997
## 5 StrainB_D2_R4 -0.0753235457617976
## 6 ---- ----
## 38 StrainB_D1_R3 -0.0872785367349405
## 39 StrainB_D2_R5 0.251012917359553
## 40 StrainB_D3_R3 -0.256438764618135
## 41 StrainA_D2_R3 0.155072201518131
## 42 StrainC_D2_R2 0.133396561245203
##
## pct_obs
## sample pct_obs
## 1 StrainC_D2_R1 0.597928197156297
## 2 StrainB_D1_R1 0.606263652323073
## 3 StrainC_D1_R2 0.606719288136047
## 4 StrainA_D1_R2 0.606779592876
## 5 StrainB_D2_R4 0.606015732836601
## 6 ---- ----
## 38 StrainB_D1_R3 0.654735262191608
## 39 StrainB_D2_R5 0.654969780624757
## 40 StrainB_D3_R3 0.654715160611624
## 41 StrainA_D2_R3 0.655036785891371
## 42 StrainC_D2_R2 0.654989882204741
##
## minimum
## sample min
## 1 StrainC_D2_R1 -5.28803117525946
## 2 StrainB_D1_R1 -6.90296758852084
## 3 StrainC_D1_R2 -7.88599163997331
## 4 StrainA_D1_R2 -9.0641588168261
## 5 StrainB_D2_R4 -8.98433240365627
## 6 ---- ----
## 38 StrainB_D1_R3 -10.5926699621308
## 39 StrainB_D2_R5 -7.14495131542973
## 40 StrainB_D3_R3 -7.99796318332263
## 41 StrainA_D2_R3 -6.31873411525732
## 42 StrainC_D2_R2 -6.2030259000209
##
## maximum
## sample max
## 1 StrainC_D2_R1 3.20201396788793
## 2 StrainB_D1_R1 3.32745017274159
## 3 StrainC_D1_R2 3.35107088869222
## 4 StrainA_D1_R2 4.63459394111101
## 5 StrainB_D2_R4 3.55081326856828
## 6 ---- ----
## 38 StrainB_D1_R3 2.48031626920593
## 39 StrainB_D2_R5 3.47439261166162
## 40 StrainB_D3_R3 2.47089345654322
## 41 StrainA_D2_R3 3.86159378287714
## 42 StrainC_D2_R2 2.47463437858901
#edata_summary(omicsData = norm_data, by = "molecule", groupvar = "Condition")
#edata_summary(omicsData = norm_data, by = "molecule", groupvar = NULL)
Missing Data
Patterns of missing data can be explored using the
missingval_result and plot_missingval
functions.
results <- missingval_result(omicsData = mypep)
plot(naRes_obj = results, omicsData = mypep, plot_type = "bar")

plot(naRes_obj = results, omicsData = mypep, plot_type = "scatter")

Normalize Data
After quality control, filtering, and exploring our data, the next
step in a typical workflow is to normalize the data. See the “Data
Normalization” vignette for more details about the normalization
approaches included in pmartR.
Since we are working with a global proteomics dataset, we will use
the SPANS algorithm to guide our selection of normalization approach and
then normalize the data. Datasets with only a couple hundred
biomolecules or fewer are not candidates for SPANS. Note that running
SPANS can take up to a couple of minutes or more.
# returns a data frame arranged by descending SPANS score
# not run due to long runtime; SPANS plot generated and saved to include in vignette
spans_result <- spans_procedure(mypep)
plot(spans_result)
Plotting the SPANS object returns a heatmap where each cell in the
grid corresponds to one combination of subset and normalization
function. A dot in the cell indicates that the method had the highest
SPANS scores; ties are possible as we see below.

Particularly when there are ties in the SPANS scores, it can be
useful to see how many biomolecules are part of the subset(s) being
used. Basing a normalization on too few biomolecules, say fewer than
10%, is not advised. The column in the SPANSRes object
called “mols_used_in_norm” provides the number of biomolecules in the
subset. In our case, the subset RIP (0.25) contains 20753 peptides, so
we will select this subset. Since all normalization functions resulted
in the same SPANS score for this subset, we will choose median.
Similar to normalizing to the reference pool, we will first create a
normRes object so we can see the effect of applying the specified
normalization ahead of actually applying it to our data.
#readRDS("/vignettes/spans_result.RDS")
mynorm <- normalize_global(omicsData = mypep,
subset_fn = "rip", params = list(rip = 0.25),
norm_fn = "median",
apply_norm = FALSE)
# plot(mynorm) # currently errors out
Now that we are satisfied with our selections, we can apply the
normalization
# apply the normalization
mynorm <- normalize_global(omicsData = mypep,
subset_fn = "rip", params = list(rip = 0.25),
norm_fn = "median",
apply_norm = FALSE)
At this point it may be of interest to re-evaluate some of the
earlier exploratory analyses or plots.
Protein Quantification Methods
Protein quantification can be done using the
protein_quant function, either with or without accounting
for different isoforms of the proteins, which are also called
‘proteoforms’. Regardless of whether the user is accounting for protein
isoforms, they must specify a method for rolling peptides up to proteins
(one of “rollup”, “rollup”, “qrollup”, or “zrollup”) and a function to
use for combining the peptide-level data (either “mean” or “median”).
When the user does wish to account for protein isoforms, they may
utilize Bayesian Proteoform Quantification (BP-Quant) (Webb-Robertson et al. 2014) via the
bpquant function, and input those results as one of the
arguments to the protein_quant function.
The protein_quant function takes in an object of class
pepData (or isobaricpepData) and returns an
object of class proData.
Rollup Methods
The rollup method takes either the mean or median of all peptides
mapping to a given protein, and sets that value as the relative protein
abundance.
In the rrollup method, all peptides that map to a single protein are
scaled based on a chosen reference peptide, which is the peptide with
most observations. Next the average or mean of the scaled peptides is
set as the relative protein abundance. (Matzke et
al. 2013)
The qrollup method starts with all peptides that map to a single
protein. Next peptides are chosen according to an abundance cutoff value
and the average or mean of the scaled peptides is set as the protein
abundance.
In the zrollup method, scaling is done similarly to the z-score
formula (except with medians instead of means). The scaling formula is
applied to peptides that map to a single protein and then the mean or
median of the scaled peptides is set as protein abundance (from DAnTE
article). The rollup method is similar to rrollup method, except there
is no scaling involved in these methods. Either the mean or median is
applied to all peptides that map to a single protein to obtain protein
abundance. (Polpitiya et al. 2008)
Here we utilize rrollup.
mypro <- protein_quant(pepData = mypep,
method = "rrollup",
combine_fn = "median")
plot(mypro, order_by = "Virus", color_by = "Virus")

Statistical Comparisons
Now that we have data at the protein level, we will perform the
comparisons of interest for these data, namely all pairwise comparisons
between the virus strains. Note that this dataset is included in the
pmartRdata package and used as an example only. Information
about Donor’s is included for exemplar usage of a second main effect, if
desired. A valid statistical analysis for an experiment that involved
Donor’s or other units of repeated measurements cannot be performed
within pmartR. To account for random or mixed effects,
please consult with a statistician as needed. We commonly use the
lme4 library for this type of statistical analysis (Bates et al. 2015).
To make all pairwise comparisons between our 3 virus strains using
both ANOVA and the IMD-test (or g-test), we can use the
imd_anova function in the following way using defaults (no
p-value adjustments being the most important default to note). The
summary function provides an overview of the results of
these tests.
mystats <- imd_anova(omicsData = mypro, test_method = "combined")
## Joining, by = "Protein"
summary(mystats)
## Type of test: combined
##
## Multiple comparison adjustment ANOVA: none
##
## Multiple comparison adjustment G-test: none
##
## p-value threshold: 0.05
##
## Number of significant biomolecules by comparison. Columns specify fold change direction and type of test:
##
## Comparison Total:Positive Total:Negative Positive:ANOVA
## 1 StrainC_vs_StrainB 126 1999 126
## 2 StrainC_vs_StrainA 39 207 38
## 3 StrainB_vs_StrainA 1692 297 1691
## Negative:ANOVA Positive:G-test Negative:G-test
## 1 1999 0 0
## 2 206 1 1
## 3 296 1 1
We can also use plot to obtain graphical results.
plot(mystats, plot_type = "volcano")
## Joining, by = c("Protein", "Comparison")

plot(mystats, plot_type = "bar")

plot(mystats, plot_type = "gheatmap") # exclude this one?
## Joining, by = c("Protein", "Comparison")
## Joining, by = c("Count_First_Group", "Count_Second_Group", "Comparison")

plot(mystats, plot_type = "fcheatmap")
## Joining, by = c("Protein", "Comparison")

See ?imd_anova for more details and options.
References
Ahmed, SE. 1995. “A Pooling Methodology for Coefficient of
Variation.” Sankhyā: The Indian Journal of
Statistics, Series B, 57–75.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015.
“Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical
Software 67 (1): 1–48.
https://doi.org/10.18637/jss.v067.i01.
Matzke, Melissa M, Joseph N Brown, Marina A Gritsenko, Thomas O Metz,
Joel G Pounds, Karin D Rodland, Anil K Shukla, et al. 2013. “A
Comparative Analysis of Computational Approaches to Relative Protein
Quantification Using Peptide Peak Intensities in Label-Free LC-MS
Proteomics Experiments.” Proteomics 13 (3-4): 493–503.
Matzke, Melissa M, Katrina M Waters, Thomas O Metz, Jon M Jacobs, Amy C
Sims, Ralph S Baric, Joel G Pounds, and Bobbie-Jo M Webb-Robertson.
2011. “Improved Quality Control Processing of Peptide-Centric
LC-MS Proteomics Data.” Bioinformatics 27 (20): 2866–72.
Polpitiya, Ashoka D, Wei-Jun Qian, Navdeep Jaitly, Vladislav A Petyuk,
Joshua N Adkins, David G Camp, Gordon A Anderson, and Richard D Smith.
2008. “DAnTE: A Statistical Tool for Quantitative Analysis
of-Omics Data.” Bioinformatics 24 (13): 1556–58.
Stacklies, Wolfram, Henning Redestig, Matthias Scholz, Dirk Walther, and
Joachim Selbig. 2007. “pcaMethods—a Bioconductor Package Providing
PCA Methods for Incomplete Data.” Bioinformatics 23 (9):
1164–67.
Webb-Robertson, Bobbie-Jo M, Melissa M Matzke, Susmita Datta, Samuel H
Payne, Jiyun Kang, Lisa M Bramer, Carrie D Nicora, et al. 2014.
“Bayesian Proteoform Modeling Improves Protein Quantification of
Global Proteomic Measurements.” Molecular & Cellular
Proteomics 13 (12): 3639–46.